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Abstract 

We estimate the size of a most loaded bin in the setting when the balls are placed 
into the bins using a random linear function in a finite field. The balls are chosen from a 
transformed interval. We show that in this setting the expected load of the most loaded 
bins is constant. 

This is an interesting fact because using fully random hash functions with the same 

class of input sets leads to an expectation of 0 ( , ) balls in most loaded bins where 

^ ^ V ^ / 

m is the number of balls and bins. 

Although the family of the functions is quite common the size of largest bins was not 
known even in this simple case. 


1 Introduction 

Our basic task is to estimate estimate the size of a largest bin in a special case of the balls 
and bins model. This models simply means that the balls are randomly thrown into bins. 
The process of their placement is of a various study - its randomness, independence and other 
properties lead to various bin sizes. The most simple model is to use fully random functions or 
some kind of their approximation to place the balls. There is a plenty of results, i.e. estimates 
of bin sizes, for various placement processes. 

When the balls are thrown independently at random to the bins the expected size of the 
largest bin is 0 (i 5 ||f^). 

One of the first results were shown by Carter and Wegman [T] and this model was used to 
design universal and perfect hashing. They showed that the expected size of a bin is a constant 
when the placement is done by the functions which we will refer to as simple linear functions. 
These functions are two-wise independent and thus achieve expected size of a largest 

bin. 

It is also possible to use functions with higher degrees of independence and obtain better 
bounds. There are lower bounds for on the speed of such functions, size needed to represent 
and the size of the largest bin and independence they achieve given by Siegel [2]. 

The need to improve the size of the largest bins lead to two-choice paradigm. Out of two 
bins, hence we use two functions, the balls is placed into the smaller one. In this model the size 
of the largest bin is O(loglogm) where m is the number of balls and bins shown by Azar et al 
|3] and improved by Vocking 

Nowadays more complicated family of functions are studied in [3]. The functions no longer 
rely on high degree of independence but are designed so that they achieve small largest bins 
even with high probability. 
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Our model exhibits the use of simple linear functions and the balls are chosen from an interval 
in Zp. Such model has a constant size of largest bins. 

2 Notation and definitions 

We refer to the set {0,..., fc — 1} as to [fc]. In the whole text we assume that p is a fixed prime. 
The set of chosen balls is denoted by S' c [p]. The number of bins is the same as the number of 
balls and is denoted by m, i.e. |S| = m. 

For each pair (a, 6) G [p]^ we define the function ^ as h'^ ^(x) = {ax + h) modp and the 
function ha,b as ha,b{x) = h'^ j(x) mod m. 

The multiset of simple linear functions mapping [p] to the range [m] is denoted by Hun and 
is defined as Hiin = {ha,b \ a, b e [p]}. For a function h G Hiin we define the size of z-th bin as 
bin(ft,, S, z) = |S n h~^{i)\ and the maximal size of the bin as lbin(/z, S) = maxjg[^] bin(/z, S, z). 

In the following text we fix the probability space to be formed by a uniform choice of G Hiin. 
The notation bin(S, z) and Ibin(S) then refers to the random variables formed by the mentioned 
random uniform choice. 

For an element x we define the value l{x, a, b) = ; that is how many “leaps” are created 

by applying the function ha,b on the element x in the field Zp. 

3 Collision probability for three elements 

We first study the probability of collision of three arbitrary elements. By collision of the elements 
we understand the event when all of the elements are mapped to the same element in [m] by 
the randomly chosen linear function. 

We fix three different elements x,y,z G [p] and we count the number of pairs {a,b) G [p]^ 
such that \ha,b{{x,y,z})\ = 1. 

We start by simplifying to the case when x = 0,y = 1 and the third element z = d for a 
suitable d g [p] such that d> 1 depending on the choice of x, y, z. 

Lemma 1 (Transformation lemma). Let x^y^z e [p] be arbitrary different elements. Moreover 
assume that ix,iy,iz s [m]. Then there exist an element d G [p] such that 

Pr [h{x) = ix,h{y) = iy, h{z) = zj = Pr [/i(0) = ix, h{l) = iy, h{d) = zj . 

Proof. The idea of the proof is simple. We show that there is a one-to-one map between simple 
linear functions mapping x,y,z to ix,iy,iz and simple linear functions transforming 0,1, d to 
the same elements. 

In the first part of the proof we observe that combining simple linear functions with a linear 
function in Zp does not change the the probability space. There is a single linear function 
transforming 0,1 to x, p in Zp which we refer to as h'^ Finally we choose d so that h'^ ^{d) = z 
and the proof is finished. 

We show that the elements x,y,z can be transformed to the elements 0,1, d so that the 
probability of the mappings from the statement of the lemma remains the same. 

Choose a,fi e [p] so that a ¥= 0. Observe that the mapping (7,(5) i—> (a7,/37 -1-5) is a 
one-to-one map on [pff. If there is another pair (e, (()) such that {ae,j3e + (j)) = {ajj/Sj + 5), 
then 7 = e and 6 = (p. Thus the mapping is injective. Also for arbitrary (r, s) G [pff the element 
{a~^r, s — j3a~^r) is mapped to {aa~^r, jdr + s — j3a~^r) = (r, s). 

The compound function h'^f^oh'^ ^ is exactly equal to the function h'^^ pa+b^ follows 

from the fact that the set of all linear functions in Zp forms a group with the operation of 
compounding functions. 
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Let H' = h I (fl; s [p]^}- From the previous we can conclude that the combination 
of a function h' e H' with a fixed function ^ is a one-to-one map in the space of functions 
H'. Also observe that the composition of a function ha^b s Hiin with h'^ ^ can not change the 
probability (count of the functions) of mapping arbitrary three elements to a their prescribed 
images. 

There is also a single function (a,/3) G [p]^, i.e. a single function h'^ p, transforming the 
elements 0 and 1 to a; and y in the field [p] without taking modulo m. It is the function 
j3 = X and a = y — x. To prove the lemma we choose d G [p] such that h'^ ^{d) = i.e. 
d = a~^{z — j5). □ 

Lemma shows that the probability properties, e.g. collision, mapping to the prescribed 
elements, for the elements x, y, z are the same as for the elements {0, l,d} where d comes from 
the previous lemma. 

Next we estimate the collision probability for the elements 0, l,d. 

Lemma 2 (Probability of collision of three elements). Let d G [p] be arbitrary element. 


Pr [|h({0,l,d})| = l] 


l + ii^ax(l,^)(l+^) 
P 


Proof. We count the number of functions h G Hun such that h({0,1, d}) = {y} for some y G [m]. 
For each x G [p] it holds that l{x, a, b) G [x] and h{x) = {ax + b — l{x,a, b)p) mod m. 

Whenever the elements 0,1 and d are mapped to the same element y it must hold that 
h{0) = h{l) and h{0) = h{d). Hence 


b mod p mod m = {a + b) mod p mod m 
b mod p mod m = {da + b) mod p mod m. 


From which we obtain the following sequence of equations 

m \ a + b — 1{1, a, b)p — b 
m I da + b — l{d,a, b)p — b, 


m I a — Z(l, a, b)p 
m \ da — l{d, a, b)p, 


TO I da — dl(l, a, b)p 

TO I (dl(l, a, b) — l{d, a, b))p. 


Since p is a prime we conclude the fact that to | d/(l, a, b) — l{d, a, b). We estimate the collision 
probabilities from the two statements following from the previous formulas: 

TO I a — l{l,a, b)p (1) 

m \ {dl{l,a,b) — l{d,a,b))p. (2) 
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The statement ([^ roughly means that out of d possible values for l{d,a,b) only the 1/m 
fraction may generate the collision of the three elements. Notice that for a hxed I e [d] it holds 
that {a e [p] | l{d,a,b) = 1} equals is a subinterval of [p]. From Q we can observe that only 
the 1/m fraction from the possible values of a lying in the appropriate intervals allowed by valid 
values of l{d,a,b) are causing collisions. 

For the rest of the proof fix the value of b. First, we show that the values of a such that 
l{d,a,b) = I e [d] form disjoint intervals in [p] each of size at most [p/d]. Then we count the 
number of values a in an interval causing collisions - using §• And hnally we count the number 
of the valid intervals. 

Let l{d,a,b) = I, then it holds that I ^ < Z + 1. Immediately we get that a e 

^ total number of values of a, i.e. integers, in each valid interval is 

at most [p/d]. The ceiling must be applied. For example assume an interval of length of 1.5 
starting at point 0.8 - it contains two integer points 1 and 2. This happens whenever is an 
integer. 

Now fix the value I e [d] such that l{d,a,b) = 1. In order to estimate the number of values 
of a causing the collisions we split into two cases according to the value of Z(l, a, b). 


The first case, ^(l,a, 5) = 0. From the two previous statements we conclude that 


m I a 
m I 1. 


The second case, Z(l,a, 6) = 1. As in the first case it must hold that 

m I a — p 
m I d — L 


In both cases, there are at most [d/m] values of I satisfying the second condition. Also for 
each satisfying value of I there are at most [[ 2 ]/m] values of a causing the collision. 

In both cases and for each b it holds that the probability of collision of the three elements is 
bounded by 


1 + 


1 + 




P 


l^ + ^ + 0{p^) if p/dm >1 

+4^ + 0 {p-^) otherwise, i.e. ^ < d < p. 


□ 

The worst possible case is for d = 2 and the probability is roughly l/2m. When d > p/m, 
the formula is a great overestimate as shown in Figure 1. 

Corollary 1. Let d < p/m. Then Pr [|/i([d])| = 1] ^ {d-i)m + 1/*^^ + 0{p ^). 

Proof. When all the elements from [d] collide, then the elements {0,1, d} must collide as well. 
The probability of the collision of {0, l,d} is hence a valid upper bound on the probability of 
the collision of the whole interval. The statement is then a direct application of Lemma □ 

For completeness we just show a simple fact that our probability estimate is tight when we 
have a stronger assumption, namely we assume p > 3m^. 
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Figure 1: The function of probability of collision of the elements 0,1, d with respect to d. Notice 
that the probability is decreasing in the part when d ^ p/m and is almost symmetric. In this 
figure m = 512 and p = 21787. 


Lemma 3. If d ^ m and p > 3m^, then Pr [|/i({0,1,..., d — 1})| = 1] = (^) ■ 

Proof. For a fixed b, i^ a < {p — h)/d and m \ a, then the elements {0,1,..., d — 1} collide. For 
each b there are at least [{p — b)/dm\ such values of a. 

We conclude that the number of pairs (a, b) making the elements collide is at least 


6e[p] 


p — b 
dm 


- 1 = 


(P + 1)P 
2 dm 


-P 


> 


— 2pdm 
2dm 


> 


P 

6dm 


Thus the resulting probability is at least 


□ 


4 The expected size of most loaded bins 

First we study the role of the parameter b in the hash function ha,b- 

The following lemma states that the effect of b on Ibin(S') is not asymptotic since it more 
or less only shifts the largest bin. 

Lemma 4. Assume that a,b e \p\ and S ^ [p]. Then 

^lhin{ha,b, S) s: lhin{ha,o, S) ^ 2 lbin(ha.&, 5'). 

Proof. Let L S he elements of bin y, i.e. ha^b{L) = y. For each x e L we have that 

ha,o{x) = ax mod p mod m 

= {ax + b — b) mod p mod m 

{ ((ax + b) mod p — b mod p) mod m if (ax + b) mod p ^ b 

(p + (ax + b) mod p — b mod p) mod m otherwise. 

Notice that the two possible new bins are either (y — b) mod m or (p + y — b) mod m. The 
lemma now follows from the following two observation. First each original bin is either shifted 
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and keeps its size or is split into two possibly uneven shifted bins - hence | lbin(ft,a_b, S') ^ 
lbin(/ia_ 0 i ^)- And notice that each new bin can only contain elements from at most two different 
original bins and thus lbin(/iQ qj S) ^ 2 lbin(/io S). □ 

For completeness let us mention that the change of the sign of a has almost no effect on 

Ibin(S). 

Lemma 5. Assume that a e [p] and S c [p] such that 0 ^ S. Then 

lbin(ha,o,-S') = \hin{hp-a,o, S). 

Proof. Similarly as in the proof of the previous lemma. Let L ^ S he elements of bin y, i.e. 
ha.o{L) = U- Let X e L, then hp-a,o{x) = (p — a)x modp mod m = {p— ((ax) modp)) mod m = 
(p — y) mod m. Observe that (p — a)x mod p = p — ((ax) mod p) holds only when x ^ 0. The 
bin y is thus moved to the bin (p — y) mod m and the lemma holds. □ 

Obviously allowing zero makes only a negligible change. 

Corollary 2. Let S c [p]^ then 

lbin(/io,o, S) - 1 ^ lbin(hp_a,o, S) sS lbin(/ia.o, S) + 1. 

For the choice of S = [to] we show that the expected size of a most loaded bin is within 
0(1). This can be compactly formulated as follows. 

Theorem 1. Assume that p > mf, then 

E[lbm([TO])] = 0(1). 

Proof. By Lemma we may assume that the chosen function has b = 0 without asymptotically 
increasing the expected size of the largest bin. In the proof of the claims we thus assume that 
the chosen linear function is exactly the function hap. Moreover we assume that a ^ 0. Notice 
that this assumption adds exactly m/p to the computed expected value which is 0(1). 

Observe that each bin is formed by a single arithmetic progression. Notice that since p is a 
prime it holds that (—p) mod to is co-prime with to. The reason can be stated as follows. Let 
xi < X 2 be two elements in a single bin, then for d = X 2 — xi it holds that to | ad mod p or 
m \ p — (ad mod p). 

All the solutions of the equation ax mod p mod to = 0 where x e [to] form a finite arithmetic 
progression. For the proof of the previous statement notice that since p is a prime it holds that 
(—p) mod TO is co-prime with to. 

In addition a difference d and a given length I, I ^ S, there is a canonical value x G [to] 
such that if there is a bin of size at least I, then there is another bin formed by an arith¬ 
metic progression of length at least I with the same difference d having x as the minimal 
element. If ad modp < p/2, we choose x = argmin,j,g|-^_;jjj ax modp. Otherwise we put 
X = argmax,,g[„_;^] ax mod p. 

After establishing the previous facts we simply compute the expected value of lbin([TO]) 
using the following idea. Now we allow b to have arbitrary value. 

Assume that lbin([TO]) > Z > 3, then there is an arithmetic progression chosen from [to] of 
size at least 1/2 collapsing into a single bin, here we use Lemma Since for a fixed difference 
and length we have its canonical position there are at most m/l possible arithmetic progressions 
from which we choose from. By Corollary we upper bound the probability of the collapse of 
the arithmetic progression as 
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Hence for Z > 3 we have 


Pr [lbin([m]) ^ 1] = 0{r^). 
Then we simply conclude that 

m 

E [lbin([m])] s: 0(1) + J^O 
1=1 



□ 

We can conclude the main result, i.e. each set transformable to [m] in Zp has constant sized 
largest bins. 

Corollary 3. Let S ^ [p], a, be [p], If'ix e [m]: (ax + b) mod pB S, then E [lbin(5')] = 0(1). 

Proof. Direct corollary of Theorem since by Lemma (extended to all the elements of S) the 
probabilistic properties of S do not change under the transformation x {ax + b) mod p. □ 
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